Fast model-based estimation of ancestry in unrelated individuals.

نویسندگان

David H Alexander

John Novembre

Kenneth Lange

چکیده

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using ancestry matching to combine family-based and unrelated samples for genome-wide association studies.

We propose a method to analyze family-based samples together with unrelated cases and controls. The method builds on the idea of matched case-control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo-controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic a...

متن کامل

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness Running Title: Population Structure Inference in Related Samples Keywords: Admixture, Population Structure, Principal Components Analysis, Relatedness

1. CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. Abstract Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestr...

متن کامل

Improving pedigree-based linkage analysis by estimating coancestry among families.

We present a method for improving the power of linkage analysis by detecting chromosome segments shared identical by descent (IBD) by individuals not known to be related. Existing Markov chain Monte Carlo methods sample descent patterns on pedigrees conditional on observed marker data. These patterns can be stored as IBD graphs, which express shared ancestry only, rather than specific family r...

متن کامل

The Study Predictors of Fast-Food Consumption based on the Prototype/Willingness Model in Students of Public Health School, Rafsanjan City, Iran

Introduction: In recent decades, a significant increase has been observed in the average weight of people due to fast food consumption, which increases the risk of developing diabetes and cardiovascular diseases. Given the importance of this issue, this study was conducted to investigate the predictors of fast food consumption based on the Prototype/Willingness Model among students the School...

متن کامل

Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

We estimated local ancestry on the autosomes and X chromosome in a large US-based study of 12,793 Hispanic/Latino individuals using the RFMix method, and we compared different reference panels and approaches to local ancestry estimation on the X chromosome by means of Mendelian inconsistency rates as a proxy for accuracy. We developed a novel and straightforward approach to performing ancestry-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Genome research

دوره 19 9 شماره

صفحات -

تاریخ انتشار 2009

Fast model-based estimation of ancestry in unrelated individuals.

نویسندگان

چکیده

منابع مشابه

Using ancestry matching to combine family-based and unrelated samples for genome-wide association studies.

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness Running Title: Population Structure Inference in Related Samples Keywords: Admixture, Population Structure, Principal Components Analysis, Relatedness

Improving pedigree-based linkage analysis by estimating coancestry among families.

The Study Predictors of Fast-Food Consumption based on the Prototype/Willingness Model in Students of Public Health School, Rafsanjan City, Iran

Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

عنوان ژورنال:

اشتراک گذاری